Finding cohesive communities with C

نویسندگان

  • Adrien Friggeri
  • Eric Fleury
چکیده

Social communities have drawn a lot of attention in the past decades. We have previously introduced and validated the use of the cohesion, a graph metric which quantitatively captures the community-ness in a social sense of a set of nodes in a graph. Here we show that the problem of maximizing this quantity is NP-Hard. Furthermore, we show that the dual problem of minimizing this quantity, for a fixed set size is also NP-Hard. We then propose a heuristic to optimize the cohesion which we apply to the graph of voting agreement between U.S Senators. Finally we conclude on the validity of the approach by analyzing the resulting agreement communities. Key-words: graph theory, community detection, np-completeness, cohesion, complexity, social network analysis ha l-0 06 92 54 8, v er si on 1 30 A pr 2 01 2 Trouver des communautés cohésives avec C Résumé : Les communautés sociales ont attiré beaucoup d’attention ces dernières années. Nous avions précédemment proposé et validé l’utilisation de la cohésion, une métrique de graphe qui capture quantitativement la qualité communautaire, au sens social, d’un ensemble de sommets d’un graphe. Nous montrons que le problème de trouver un ensemble de cohésion maximum dans un graphe non orienté est NP-dur. Par ailleurs, nous montrons que le problème dual de minimiser cette quantité, pour une taille donnée, est aussi NP-dur. Nous proposons ensuite une heuristique pour optimiser la cohésion que nous appliquons au graph d’agrément de vote entre Sénateurs des États-Unis. Finalement nous concluons sur la validité de l’approche en analysant les communautés résultantes. Mots-clés : théory des graphes, détection de communautés, np-completude, cohésion, compléxité, analyse de réseaux sociaux ha l-0 06 92 54 8, v er si on 1 30 A pr 2 01 2 Finding cohesive communities with C 3 In [1], we have introduced a new metric called the cohesion which rates the communityness of a group of people in a social network from a sociological point of view. The idea behind the cohesion is, rather than looking at the proportion of edges falling inside and in between communities, to take into account the triads in the network and define a community as a subgraph having a high transitivity and featuring a low number of triangles going outwards to the rest of the network. Through a large scale experiment on Facebook, we have established that the cohesion is highly correlated to the subjective user perception of the communities. In this article, we show that finding a set of vertices with maximum cohesion is NP-hard. We will then also establish that the dual problem of finding the less cohesive groups of a graph is NP-hard. Then we shall introduce C, a heuristic which covers a given graph with cohesive communities by pseudogreedily expanding around selected edges. Finally we shall validate this heuristic by studying the communities it yields on the agreement graph of U.S. Senators on which we shall be able to demonstrate that the communities which are obtained independently for each Congress Session are stable through time and can be identified with political parties. Notations Let G = (V,E) be a graph with vertex set V and edge set E of size n = |V | ≥ 4. For all vertices u ∈ V , we write dG(u) the degree of u, or more simply d(u) and N (S) the set of neighbors of S. A triangle in G is a triplet of pairwise connected vertices. For all sets of vertices S ⊆ V , let G[S] = (S,ES) be the subgraph induced by S on G. We write m(S) = |ES | the number of edges in G[S], and . (S) = |{(u, v, w) ∈ S : (uv, vw, uw) ∈ E S}| the number of triangles in G[S]. We define . (S) = |{(u, v, w), (u, v) ∈ S, w ∈ V \ S : (uv, vw, uw) ∈ E}|, the number of outbound triangles of S, that is: triangles in G which have exactly two vertices in S. Finally, we recall the definition of the cohesion. C(S) = . (S) (|S| 3 ) ( . (S) + . (S)) The cohesion is a measure of the community-ness of a set of nodes and is a compromise between a large density of triangles inside the community and the amount of triangles pointing outwards from the community.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multivariate Algorithmics for Finding Cohesive Subnetworks

Community detection is an important task in the analysis of biological, social or technical networks. We survey different models of cohesive graphs, commonly referred to as clique relaxations, that are used in the detection of network communities. For each clique relaxation, we give an overview of basic model properties and of the complexity of the problem of finding large cohesive subgraphs un...

متن کامل

A Description-driven Community Detection

Traditional approaches to community detection, as studied by physicists, sociologists, and more recently computer scientists, aim at simply partitioning the social network graph. However, with the advent of online social networking sites, richer data has become available: beyond the link information, each user in the network is annotated with additional information, e.g., demographics, shopping...

متن کامل

Finding Theme Communities from Database Networks: from Mining to Indexing and Query Answering

Given a database network where each vertex is associated with a transaction database, we are interested in finding theme communities. Here, a theme community is a cohesive subgraph such that a common pattern is frequent in all transaction databases associated with the vertices in the subgraph. Finding all theme communities from a database network enjoys many novel applications. However, it is c...

متن کامل

Community Finding in Large Social Networks Through Problem Decomposition

The identification of cohesive communities is a key process in social network analysis. However, the algorithms that are effective for finding communities do not scale well to very large graphs, since their time complexity is worse than linear in the number of edges in the graph. This is an important issue as there is considerable interest in applying social network analysis to large datasets, ...

متن کامل

. so c - ph ] 1 4 Se p 20 09 Communities , Knowledge Creation , and Information Diffusion

In this paper, we examine how patterns of scientific collaboration contribute to knowledge creation. Recent studies have shown that scientists can benefit from their position within collaborative networks by being able to receive more information of better quality in a timely fashion, and by presiding over communication between collaborators. Here we focus on the tendency of scientists to clust...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012